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NATURAL-LANGUAGE SPEECH CONTROL 

Technical Field 

The present invention relates generally to the technical 
5 field of digital computer speech recognition and, more particu- 
larly, to recognizing and executing commands spoken in natural- 
language- 

Background Art 

10 Currently, humans communicate with a computer primarily 

tactilely via keyboard or pointing device with commands that must 
strictly conform to computer program syntax. However, speech is 
the most natural method for humans to express commands. To 
improve speed, usability and user acceptance of computers there 

15 exists a well recognized need for a voice-based command system 
that responds appropriately to only a general description of 
tasks to be performed by the computer. Some systems have been 
demonstrated which permit speaking conventional computer 
commands. For example, a MS DOS command for copying all Word for 

2 0 Windows files in one directory into another directory named 
"john" might be spoken as follows. 

Copy *.doc john 

However, to be truly effective- a voice-based command system needs 
not only to translate a spoken command into a sequence of words, 
25 but also to interpret natural-language sentences such as that set 
forth below as a coherent command recognizable to and executable 
by the computer. 

Copy all word files to John's directory. 
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Because natural-language allows a computer user to prescribe 
a set of smaller tasks with a single sentence, an ability to 
handle high-level, abstract commands is key to an effective 
voice-based command system. The ability to handle high-level, 
5 abstract commands makes a voice-based command interface easy to 
use and potentially faster than keyboard or pointing device based 
computer control. Moreover, under certain circumstances a voice- 
based command system is essential for controlling a computer's 
operation such as for physically handicapped individuals, and for 
10 normal individuals while performing tasks which occupy both of 
their hands . 

Voice control of computers allows speech and a high-level 
of abstraction and complexity in the command. For instance, in 
giving directions we might simply say "turn left at the light". 
15 Presently, this type of command is possible only when communicat- 
ing with other humans. Communicating with computers or 
equipment requires a series of commands at a much lower level of 
abstraction. For instance, the previous instruction would at a 
minimum need to be expanded as follows. 
20 Go Straight 

Find Light 
Turn Left 
Go Straight 

Similarly for a jet aircraft landing on a deck of an aircraft 
25 carrier, the command **abort landing** would at a minimum translate 
to the following set of commands. 

Afterburner On 

Steady Course 

Retract Flaps 
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Retract Speed Brakes 
Retract Landing Gear 
Issuing the preceding sequence of commands by voice requires 
too much time, and would therefore probably result in a crash. 
5 To be effective, the pilot needs to be able to control the 
aircraft with one high-level command, in this case "ajbort 
landing^\ and the computer must execute all the commands needed 
to accomplish this task. 

In actual practice, each of the natural-language commands 
10 set forth above needs a set of sub-instructions. Thus, despite 
the present ability of computer technology to transcribe speech 
into words, real-time voice control of equipment has, thus far, 
remain an elusive goal. Conversely, an ability to issue spoken 
natural-language commands permits communicating with equipment 
15 ranging from computers to aircraft at a higher level of abstrac- 
tion than is presently possible. A natural-language voice 
interface will allow applications such as voice control of 
vehicles, and voice control of computer applications. 

There are three basic approaches to natural-language 
20 syntactic processing: simple grammar, statistical, and Govern- 
ment-and-Binding-based (GB-based) . Of these three approaches, 
simple grammars are used for simple, un-complicated syntax. 
Examples of grammars for such a syntax include early work such 
as the psychiatrist program 'Eliza'. However, writing a full 
25 grammar for any significant portion of a natural-language is very 
complicated. For specialized domains, the grammar based approach 
is abandoned for a statistical one as described by Carl G. de. 
Marcken, Parsing the Lob Corpus, Proceedings of the 2 8 Annual 
Meeting of the Association for Computational Linguistics, June, 



1990. Statistical approaches look at word patterns and word 
co-occurrence and attempt to parse natural-language sentences 
based on the likelihood of such patterns. Statistical approaches 
use a variety of methods including neural networks and word 
distribution. As with any other statistical pattern matching 
approach, this approach is ultimately limited by an upper limit 
on error rate which cannot be easily exceeded. Also, it is very 
difficult to handle wide varieties of linguistic phenomena such 
as scrambling, NP-movement, binding between question words and 
empty categories, etc. , through statistical natural-language 
processing. 

Approaches to natural-language processing based on Noam 
Chomsky's Government and Binding theories as described in Some 
Concepts and Consequences of the Theory of Government and Binding 
Cambridge, Mass. MIT Press, offer a possibility of a more robust 
approach to natural-language parsing by developing computational 
methods based on linguistic theory of a universal language. 
Head-driven Phrase Structure Grammar (HPSG) is a major off-shoot 
of GB theory and a number of such parsers are being developed. 
The GB-based approach can find syntactic structure in scrambled 
sentences such as 'I play football' and 'football, I play*. The 
GB-based approach also handles NP-movement that is exemplified 
by a passive sentence such as, 'Football was played.' which has 
the deeper structure ' [ [] [was played [football]] ]'. In 
parsing this natural-language sentence the noun phrase (NP) 
•football' moves from its original position after the verb 'was 
played' to the front of the sentence, because otherwise the 
sentence would have no subject. Binding between question words 
and empty categories is exemplified by a question such as 'Whom 
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will he invite. ' The GB approach finds that this sentence has 
the deep structure [he will [invite [whom]] ]. The question word 
•whom* binds the empty trace that it leaves when it moves to the 
front of the sentence. 
5 The principle-based-parsing technique described by Robert 

C. Berwick, in Principles of Principle-Based ' Parsing, 
Principle-Based Parsing: Computational and Psycholinguistics , 
Kluwer Academic Publishers, pp. 1-37 (1991), and by Fong and 
Sandiway in The Computational Implementation of Principle-Based 

10 Parsers, Principle-Based Parsing: Computational and 
Psycholinguistics, Kluwer Academic Publishers, pp. 65-83 (1991), 
offers a possibility of a more robust approach. 
Principle-based-parsing uses a few principles for filtering 
sentences. A sequence of principle based filters eliminates 

15 illegal parses and the remaining parse is the legal one. A 
primary difficulty with this method is that it generates too many 
parses which makes the GB-based approach computationally slow. 
Methods for improving performance of GB-based parsing include: 

1. appropriately sequencing the principle based filters 
20 to reduce over-generation as described by Fong and 

Sandiway; or 

2. 'co-routining' by interleaving the actual parsing 
mechanism with the principle filters as described by 
Bonnie Jean Dorr in Principle-Based Parsing for 

25 Machine Translation , Principle-Based Parsing: Computa- 

tional and Psycholinguistics, Kluwer Academic Publish- 
ers, pp. 153-183 (1991). 
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Disclosure of Invention 

An object of the present invention is to provide a voice- 
based command system that can translate commands spoken in 
natural- language into commands accepted by a computer program. 
5 An object of the present invention is to provide a voice- 

based command system that can translate commands spoken in 
natural-language into commands accepted by different computer 
programs . 

Another object of the present invention is to provide a 

10 natural-language-syntactic-par^er that resolves ambiguities in 
a voice command. 

Another object of the present invention is to provide a 
command interpreter that handles incomplete commands gracefully 
by interpreting the command as far as possible, and by retaining 

15 information from the command for subsequent clarification. 

Another object of the present invention is to provide a 
voice based command system that is efficient in any operating 
environment, and that is portable with minor modifications to 
other operating environments. 

20 Briefly, the present invention is a natural-language speech 

control method that produces a command for controlling the 
operation of a digital computer from words spoken in a natural- 
language • The method includes the step of processing an audio 
signal that represents the spoken words to generate textual 

25 digital-computer-data. The textual digital-computer-data 

contains representations of the words in the command spoken in 
a natural-language. The textual digital-computer-data is then 
processed by a natural-language-syntactic-parser to produce a 
parsed sentence. The parsed sentence consists of a string of 
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words with each word being associated with a part of speech in 
the parsed sentence. The string of words is then preferably 
processed by a semantic compiler to generate the command that 
controls the operation of the digital computer. 
5 The preferred embodiment of the present invention uses a 

GB-based natural-language-syntactic-parser which reveals implied 
syntactic structure in English language sentences. Hence the 
GB-based natural-language-syntactic-parser can resolve ambiguous 
syntactic structures better than alternative methods of natural- 
ID language processing. Using a generalized principles-and- 
parameters GB-based natural-language-syntactic-parser for the 
natural- language speech control method provides a customizable 
and portable parser that can be tailored to different operating 
environments with modification. With generalized principles-and- 
15 parameters, a GB-based approach can describe a large syntax and 
vocabulary relatively easily, and hence provides greater robust- 
ness than other approaches to natural-language processing. 

These and other features, objects and advantages will be 
understood or apparent to those of ordinary skill in the art from 
20 the following detailed description of the preferred embodiment 
as illustrated in the various drawing figures. 

Brief Description of Drawings 

FIG. 1 is a flow diagram illustrating the overall approach 
25 to processing spoken, natural-language computer commands with a 
natural-language speech control system in accordance with the 
present invent ion ; 
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FIG* 2 is a flow diagram, similar to that depicted in FIG. 
1, that illustrates the presently preferred embodiment of the 
natural-language speech control system; 

FIG. 3 depicts a logical form of a parsing of a sentence 
5 produced the presently preferred GB-based principles-and- 
parameters syntactic parser employed in the natural-language 
speech control system depicted in FIG. 2; 

FIG. 4 is a flow diagram illustrating how a sentence is 
parsed by the presently preferred GB-based principles-and-parame- 
10 ters syntactic parser employed in the natural-language speech 
control system depicted in FIG. 2; 

FIG. 5 is a block diagram depicting an alternative embodi- 
ment of a semantic compiler that converts parsed computer 
commands into machine code executable as a command to a digital 
15 computer program; and 

FIG. 6, is a block diagram depicting a preferred embodiment 
of a semantic compiler that converts parsed computer commands 
into machine code executable as a command to a digital computer 
program. 

20 

Best Mode for Carrying Out the Invention 

FIG. 1 depicts a natural-language speech control system in 
accordance with the present invention referred to by the general 
reference character 2 0. As illustrated in FIG. 1, the 
25 natural-language speech control system 20 first processes a 
spoken command received as an audio signal with a robust 
automatic speech recognition computer program 22. The speech 
recognition computer program 2 2 produces textual digital- 
computer-data in the form of an ASCII text stream 24 that 
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contains a text of the spoken words as recognized by the speech 
recognition computer program 22. The text stream 24 is then 
processed by a syntactic-parser 2 6 which converts the text stream 
24, representing the spoken words, into a parsed sentence having 
a logical form 28. The logical form 28 associates a part of 
speech in the parsed sentence with each word in a string of 
words. The logical form 2 8 is processed by a semantic compiler 
3 2 to generate a command in the form of a machine code 34 that 
is then processed by a computer program executed by a computer 
36 to control its operation. 

As is readily apparent to those skilled in the art, the 
speech recognition computer program 22, syntactic-parser 26 and 
semantic compiler 32 will generally be computer programs that are 
executed by the computer 36. Similarly, the text stream 24 and 
logical form 28 data in general will be stored, either temporari- 
ly or permanently, within the computer 36. 

FIG. 2 is a flow diagram that depicts a presently preferred 
implementation of the natural-language speech control system 20. 
As depicted in FIG. 2, the preferred implementation of the 
natural-language speech control system 20 includes an error 
message facility 42. The error message facility 42 permits the 
natural-language speech control system 20 to inform the speaker 
of difficulties that the natural-language speech control system 
20 encounters in attempting to process a spoken computer command. 
The error message facility 42 inform the speaker about the 
processing difficulty either audibly or visibly. In the specific 
implementation of the natural-language speech control system 2 0 
depicted in FIG. 2, the machine code 34 produced by the semantic 
compiler 32 is an MS DOS command. The computer 36 executes the 
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MS DOS command to produce a result 44 specified by the spoken 
command . 

Speech Recognition Computer Program 22 

The speech recognition computer program 22 processes the 
audio signal that represents spoken words to generate a string 
of words forming the text stream 24 . A number of companies have 
developed computer programs for transcribing voice into text. 
Several companies offering such computer programs are listed 
below. 

1. BBN, a wholly owned subsidiary of GTE, has a 
Unix-based speech recognizer called Hark 

2 . Dragon Systems markets Dragon Dictate 

3. IBM markets VoiceType Dictation 

4. Kurzweil Applied Intelligence 

5. Microsoft Research's Speech Technology Group is 
developing a speech recognition engine named Whisper 

6 • PureSpeech 

7. SRI Corp's STAR Lab has a group developing a 
wideband, continuous speech recognizer called DECI- 
PHER 

8. The AT&T's Advanced Speech Products Group offers a 
speech recognizer named WATSON. 

Most of the systems identified above work with discrete 
speech in which a speaker must pause between words. Also these 
systems require some level of speaker training to attain high- 
accuracy speech recognition. Ideally, a continuous speech 
recognizer that employs a Hidden Markov Model is to be preferred. 
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Of the systems listed above, Dragon Systems speech recognizer 
seems to be the most robust, has been used by the United States 
Armed forces in Bosnia, and is presently preferred for the 
natural-language speech control system 20. The Dragon Systems 
5 speech recognizer runs on an IBM PC compatible computer operating 
under the Microsoft Windows graphical user interface. Initial 
tests have demonstrated a very high degree of accuracy with a 
large number of speakers with unconstrained language and a 
variety of accents. 
10 In general, for a single sentence or command the speech 

recognition computer program 22 can generate a plurality of 
word-vectors. Each word-vector corresponds to one spoken word 
in the sentence or computer command. Each word-vector includes 
at least one, but probably several, two-tuples consisting of a 
15 word recognized by the speech recognition computer program 22 
together with a number which represents a probability estimated 
by the speech recognition computer program 22 that the audio 
signal actually contains the corresponding spoken word. 
Exhaustive processing of a spoken command by the syntactic-parser 
20 26 reguires that several strings of words be included in the text 
stream 24. Each string of words included in the text stream 24 
for such exhaustive processing is assembled by concatenating 
successive words selected from successive word-vectors. The 
several strings of words in the text stream 24 to be processed 
25 by the syntactic-parser 2 6 are not identical because in every 
string at least one word- differs from that in all other strings 
of words included in the text stream 24. 



- 12 - 

Syntactic-Parser Computer Program 2 6 

The , syntactic-parser 26 incorporated into the preferred 
embodiment of the natural- language speech control system 20 is 
based on a principles-and-parameters (P-and-P) syntactic parser, 
5 Principar. Principar has been developed by and is available from 
Prof. DeKang Lin at the University of Manitoba in Canada. 
p_and-P parsing is based on Noam Chomsky's GB-based theory of 
natural-language syntax. Principar 's significant advantage over 
other natural-language-syntactic-parsers is that with relatively 
10 few rules, it can perform deep parses of complex sentences. 

The power of the P-and-P framework can be illustrated by 
considering how it can easily parse both Japanese and English 
language sentences. In English, typically the word order in a 
sentence is subject-verb-object as in •He loves reading'. But 
15 in Japanese, the order is typically subject-object-verb. Now if 
GB-based parser employs a principle which states that 'sentences 
contain subjects, objects and verbs', and the GB-based parser's 
parameter for 'word-order' of sentences is subject- verb-object 
for English and subject-object- verb for Japanese, the GB-based 
20 parser's principles and parameters described a grammar for simple 
sentences in both English and Japanese. This is the essence of 
the P-and-P framework. 

To describe the complex interactions of different sentence 
elements, the syntactic-parser 2 6 depicted in FIG. 2 uses the 
25 following principles. 

1. Case Theory: Case theory requires that every overt 
noun phrase (NP) be assigned an abstract case, such as 
nominative case for subjects, accusative case for 
direct objects, dative case for indirect objects, etc. 
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2. X-bar Theory: X-bar theory describes how the syntactic 
structure of a sentence is formed by successively 
smaller units called phrases. This theory determines 
the word-order in sentences. 

3. Movement Theory: The rule Move-a specifies that any 
sentence element can be moved from its base position 
in the underlying D-structure, to anywhere else in the 
surface structure. Whether a particular movement is 
allowed depends on other constraints of the grammar. 
For example, the result of a movement must satisfy the 
X-bar schema. 

4. Bounding Theory: This theory prevents the results of 
movement from extending too far in the sentence. 

5. Binding Theory: This theory describes the structural 
relationship between an empty element left behind by 
a moved NAP and the moved NP itself. 

6. 6-Theory: This theory deals with the assignment of 
semantic roles to the NPs in a sentence. 

The preceding principles, and some other more complex ones that 
are described by Robert C. Berwick, in Principles of 
Principle-BasBd Parsing, Principle-Based Parsing: Computational 
and Psycholinguist ics, Kluwer Academic Publishers, pp. 1-37 
(1991) , are used for parsing English with. Principar. 

With a GB-based approach to natural-language parsing, 
commands to computers can be understood as verb phrases that are 
a sub-set of complete English sentences. The sentences have an 
implied second person singular pronoun subject and the verb is 
active voice present tense- For instance, to resume work on a 
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previous project, one might issue to a computer the following 
natural-language command. 



'Edit the first document on nip-based command interpreters.' 

Possible word vectors that the speech recognition computer 
program 22 might produce for the preceding sentence are set forth 
below. 



10 



15 



20 



25 



30 



35 



40 



45 





edit 


0.90 




a-dot 


0.50 


the 


the 


0.70 




da 


0. 60 




their 


0.40 




there 


0.40 




t heiti 


0 . 20 


first 


first 


0.80 




force 


0.40 




fast 


0.30 




force 


0.30 




hearse 


0.15 




curse 


0.15 




purse 


0.05 


document 


document 


0.80 




dock-meant 


0.40 


on 


on 


0.75 




hun 


0.40 




an 


0.35 


nip 


nip 


0.10 


based 


based 


0.75 




baste 


0.50 




paste 


0.35 


command 


command 


0.90 




come-and 


0.55 


interpreters 


interpreters 


0.85 


inter-porter 


0.40 



A parsing of the preceding sentence by Principar for the 
actual words appears in FIG. 3. The GB-based parse presented 
in FIG. 3 allows the computer to map a verb (V) into a computer 
50 command action, with the noun phrase (NP) as the object, and the 
adjective phrase (AP) as properties of the object. 
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Limiting GB-based syntactic parsing to only active voice, 
second person verb-phrase parsing, permits implementing an 
efficient semantic compiler 3 2 that allows operating a computer 
with computer transcribed voice commands. Since only a sub-set 
of English is used computer commands, parameters can be set to 
limit the number of parses generated by the syntactic-parser 26. 
For example, the case principle may be set to only accusative 
case for verb-complements, oblique case for prepositional 
complements and genitive case for possessive nouns or pronouns. 
A nominative case principle is unnecessary since the computer 
commands lack an express subject for the main clause. Such 
tuning of the principles to be applied by the syntactic-parser 
2 6 significantly reduces the number of unnecessary parses 
produced by the GB-based P-and-P syntactic-parser Principar. 

By using a GB-based P-and-P syntactic-parser, moving the 
natural-language speech control system 2 0 between computer 
applications or between computer platforms involves simply 
changing the lexicon, and the parameters. Due to the modular 
framework of the grammar implemented by the syntactic-parser 26, 
0 with minor changes in parameter settings more complicated 
sentences such as the following queries and implicit commands may 
be parsed. 

'Which files have been modified after July 4th?' 
5 'How many words are there in this document?' 

'I would like to delete all files in this directory .' 

As illustrated in FIG. 4, the syntactic-parser 26 includes 
a set of individual principle-based parsers 52 P^ through P„, a 

0 dynamic principle-ordering system 54, principle parameters 
specifiers 56, and a lexicon specifying system 58. The heart of 
the syntactic-parser 26 is the set of individual principle-based 
parsers 52. Each of the principle-based parsers 52 "implements 
an individual principle such as those listed and described above. 

5 Each principle is abstract and is described in a manner different 
from each other (i.e. heterogeneous) . For instance, the X-bar 
theory for English states that the verb must precede the object, 
while the 0-theory states that every verb must discharge itself. 
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The various principle-based parsers 52, each implemented as a 
separate computer program module, formalize the preceding princi- 
ples. Each principle-based parser 52 applies its principle to 
the input text and the legal parses which it receives from the 
preceding principle-based parser 52. The principle-based parser 
52 then generates a set of legal parses according to the 
principle which it formalizes. Because the principle-based 
parsers 52 process an input sentence sequentially, the 
syntactic-parser 26 employs a set of data structures common to 
all the principle-based parsers 52 that allows the input text and 
the legal parses to be passed from one principle-based parser 52 
to the next. Moreover, the syntactic-parser 26 includes a 
principle-ordering system 54 that controls a sequence in which 
individual principles, such as those summarized above, are 
applied in parsing a text. 

To parse more than one language, each of the principle-based 
parsers 52 receives parameter values from the principle parame- 
ters specifiers 56. For instance, with the X-bar theory, a verb 
precedes the object in English, while in Japanese the object 
0 precedes the verb. Consequently, the grammar for each principle 
formalized in the principle-based parsers 52 needs to be dynami- 
cally generated, based on parameter values provided by the 
principle parameters specifiers 56. 

Principar's lexicon specifying system 58 contains over 
5 90,000 entries, extracted out of standard dictionaries. The 
structure of the lexicon specifying system 58 is a word-entry 
followed by functions representing parts-of-speech categories and 
other features. To properly parse computer commands, Principar's 
lexicon must be extended by adding recently adopted, platform- 
0 specific computer acronyms. 

Semantic Compiler Computer Program 32 

Parsing the text stream 24 into the logical form depicted 
in FIG. 3 permits the semantic compiler 32 to use a conventional 
5 LR grammar in generating the machine code 34 from the logical 
form 28. Parsing the text stream 24 into the canonical form is 
possible because commands are restricted to imperative sentences 
that are second person, active voice sentences that begin with 
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a verb. Limiting the natural-language commands in this way 
insignificantly restricts the ability to issue voice commands. 
The canonical logical form of a command can be parsed into the 
machine code 34 by the semantic compiler 32 using a conventional 
5 lexical analyzer named LEX and a conventional compiler writer 
named YACC, . 

As indicated in FIG. 2, the preferred semantic compiler 32 
has an ability to detect some semantic errors, and then send a 
message back to a speaker via the error message facility 4 2 about 
10 the specific nature of the error. An example of a semantic error 
would be if an action was requested that was not possible with 
the object. For instance an attempt to copy a directory to a 
file would result in a object type mix-match, and therefore cause 
an error. 

15 An alternative approach for generating the machine code 3 4 

to the conventional LR grammar described above would be for the 
semantic compiler 32 to take parse trees expressed in the 
canonical form in the logical form 28 as input and then map them 
into appropriate computer commands. This would be done by a 

20 command-interpreter computer program 62 by reference to mapping 
tables 64 which maps verbs to different actions. 

Different computer programs perform the same abstract 
natural-language commands for similar operations. However, each 
computer program requires different types of commands that need 

2 5 to be handled uniquely.- The conventional LR grammar, or the 
combined command-interpreter computer program 62 and mapping 
tables 64, permit the semantic compiler 32 to prepare operating 
system commands 72, word processing commands 74, spreadsheet 
commands 76, and/or database commands 78 from the parse trees in 

30 the logical form 28. Note that the command-interpreter computer 
program 62 needs to have different functionality depending on the 
application domain to which the command is addressed. If a 
computer command is directed to DOS or a Unix command shell, the 
operating system can directly execute the machine code 34. But 
35 the word processing commands 74, the spreadsheet commands 76, or 
the database commands 7 8 must be piped through the operating 
system to that specific application. To facilitate this kind of 
command, the natural-language speech control system 20 must run 
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in the background piping the machine code 3 4 to the current 
application. 

Industrial Applicability 

In adapting the natural-language speech control system 20 
for preparing commands for execution by a variety of computer 
programs, the speech recognition computer program 22 and the 
syntactic-parser 2 6 are the same regardless of the computer 
program that will execute the command. However, as depicted in 
FIG, 6, the semantic compiler 32 includes a set of semantic 
modules 84 used for generating commands that control different 
computer programs. Of these semantic modules 84, there are a set 
of semantic modules 84 to prepare commands for controlling 
operating system functions. Other optional semantic modules 84 
generate commands for controlling operation of different applica- 
tion computer programs such as the word processing commands 74, 
spreadsheet commands 76 and database commands 78 illustrated in 
FIG. 5. In addition, the semantic compiler 32 includes a set of 
semantic modules 84 for configuration, and for loading each 
0 specific application computer program. 

Although the present invention has been described in terms 
of the presently preferred embodiment, it is to be understood 
that such disclosure is purely illustrative and is not to be 
interpreted as limiting. While preferably the text stream 24 
5 represents a spoken command with an ASCII text stream, as is 
readily apparent to those skilled in the art any digital computer 
representation of textual digital-computer-data may be used for 
expressing such data in the text stream 24. Similarly, while 
preferably the semantic compiler 32 employs a canonical logical 
0 form to represent computer commands parsed by the semantic 
compiler 32, any other representation of the parsed computer 
commands that provides the same informational content may be used 
in the semantic compiler 32 for expressing parsed commands. 
Consequently, without departing from the spirit and scope of the 
5 invention, various alterations, modifications, and/or alternative 
applications of the invention will, no doubt, be suggested to 
those skilled in the art after having read the preceding 
disclosure. Accordingly, it is intended that the following claims 
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be interpreted as encompassing all alterations, modifications, 
or alternative applications as fall within the true spirit and 
scope of the invention. 
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The Claims 

What is claimed is: 

1. A universal voice-command-interpretation method for 
5 producing from spoken words a command that is adapted for 

controlling operation of a digital computer, the method compris- 
ing the steps of: 

receiving an audio signal that represents the spoken words; 

processing the received audio signal to generate therefrom 
10 textual digital-computer-data that contains representations of 
individual spoken words; 

processing the textual digital-computer-data with a 
natural-language-syntactic-parser to produce a parsed sentence 
that consists of a string of words with each word being associat- 
15 ed with a part of speech in the parsed sentence; and 

generating the command from the parsed sentence. 

2 . The method of claim 1 wherein the parsed sentence has 
a syntax of an implied second person singular pronoun subject and 

20 an active voice present tense verb. 

3. The method of claim 1 wherein processing of the audio 
signal to generate therefrom the textual digital-computer-data 
produces a plurality of word-vectors. 

25 

4. The method of claim 3 wherein each word-vector includes 
at least one two-tuple consisting of a word together with a 
number which represents a probability that the audio signal 
actually contains that spoken word. 

30 

5. The method of claim 3 wherein the textual 
digital-computer-data processed by the 
natural-language-syntactic-parser consists of a string of words, 
each successive word being selected from successive word-vectors 

35 in the plurality of word-vectors. 

6. The method of claim 5 wherein in producing the command 
the natural-language-syntactic-parser processes at least two 
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unidentical strings of words in which at least one word is 
different . 

7. The method of claim 1 wherein the 
natural- language-syntactic-parser is a government-and-binding- 
based (GB-based) natural-language-syntactic-parser. 

8. The method of claim 7 wherein the GB-based 
natural-language-syntactic-parser is a principles-and-parameters 
(P-and-P) syntactic parser, 

9. The method of claim 1 wherein the command is generated 
from the parsed sentence by a semantic compiler, 

10. The method of claim 9 wherein the semantic compiler 
uses a LR grammar in generating the command. 

11. The method of claim 10 wherein the semantic compiler 
upon detecting a semantic error dispatches a message that 
describes the semantic error. 

12. The method of claim 11 wherein the message describing 
the semantic error that is dispatched by the semantic compiler 
is presented audibly to a speaker. 

13. The method of claim 11 wherein the message describing 
the semantic error that is dispatched by the semantic compiler 
is presented visibly to a speaker. 

14. The method of claim 10 wherein the semantic compiler 
includes a plurality of semantic modules that respectively 
generate commands for controlling operation of different computer 
programs. 

15. The method of claim 14 wherein the semantic compiler 
includes a at least one semantic modules that generates operating 
system commands. 
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16. The method of claim 14 wherein the semantic compiler 
includes a at least one semantic modules that generates applica- 
tion program commands. 

5 17. The method of claim 14 wherein the semantic compiler 

includes a at least one semantic modules that generates configu- 
ration commands. 

18. The method of claim 14 wherein the semantic compiler 
10 includes a at least one semantic modules that generates program 
loading commands. 



19. The method of claim 1 further comprising the step of 
transmitting the command to the digital computer. 
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NATURAL-LANGUAGE SPEECH CONTROL 

Technical Field 

The present invention relates generally to the technical 
5 field of digital computer speech recognition and, more particu- 
larly, to recognizing and executing commands spoken in natural- 
language* 

Background Art 

10 Currently, humans communicate with a computer primarily 

tactilely via keyboard or pointing device with commands that must 
strictly conform to computer program syntax. However, speech is 
the most natural method for humans to express commands. To 
improve speed, usability and user acceptance of computers there 

15 exists a well recognized need for a voice-based command system 
that responds appropriately to only a general description of 
tasks to be performed by the computer. Some systems have been 
demonstrated which permit speaking conventional computer 
commands. For example, a MS DOS command for copying all Word for 

20 Windows files in one directory into another directory named 
"john" might be spoken as follows. 

Copy * .doc John 

However, to be truly effective a voice-based command system needs 
not only to translate a spoken command into a sequence of words, 
25 but also to interpret natural-language sentences such as that set 
forth below as a coherent command recognizable to and executable 
by the computer. 

Copy all word files to John's directory. 
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Because natural-language allows a computer user to prescribe 
a set of smaller tasks with a single sentence, an ability to 
handle high-level, abstract commands is key to an effective 
voice-based command system. The ability to handle high-level, 
5 abstract commands makes a voice-based command interface easy to 
use and potentially faster than keyboard or pointing device based 
computer control. Moreover, under certain circumstances a voice- 
based command system is essential for controlling a computer's 
operation such as for physically handicapped individuals, and for 
10 normal individuals while performing tasks which occupy both of 
their hands. 

Voice control of computers allows speech and a high-level 
of abstraction and complexity in the command. For instance, in 
giving directions we might simply say "turn left at the light^\ 
15 Presently, this type of command is possible only when communicat- 
ing with other humans • Communicating with computers or 
equipment requires a series of commands at a much lower level of 
abstraction. For instance, the previous instruction would at a 
minimum need to be expanded as follows. 
20 Go Straight 

Find Light 
Turn Left 
Go Straight 

Similarly for a jet aircraft landing on a deck of an aircraft 
25 carrier, the command abort landing^' would at a minimum translate 
to the following set of commands. 

Af terJburner On 

Steady Course 

Retract Flaps 
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Retract Speed Brakes 
Retract Landing Gear 
Issuing the preceding sequence of commands by voice requires 
too much time, and would therefore probably result in a crash. 
5 To be effective, the pilot needs to be able to control the 
aircraft with one high-level command, in this case abort 
landing*^ and the computer must execute all the commands needed 
to accomplish this task. 

In actual practice, each of the natural-language commands 
10 set forth above needs a set of sub-instructions. Thus, despite 
the present ability of computer technology to transcribe speech 
into words, real-time voice control of equipment has, thus far, 
remain an elusive goal. Conversely, an ability to issue spoken 
natural-language commands permits communicating with equipment 
15 ranging from computers to aircraft at a higher level of abstrac- 
tion than is presently possible. A natural-language voice 
interface will allow applications such as voice control of 
vehicles, and voice control of computer applications. 

There are three basic approaches to natural-language 
2 0 syntactic processing: simple grammar, statistical, and Govern- 
ment-and-Binding-based (GB-based) . Of these three approaches, 
simple grammars are used for simple, un-complicated syntax. 
Examples of grammars for such a syntax include early work such 
as the psychiatrist program 'Eliza*. However, writing a full 
25 grammar for any significant portion of a natural-language is very 
complicated. For specialized domains, the grammar based approach 
is abandoned for a statistical one as described by Carl G. de. 
Marcken, Parsing the Lob Corpus, Proceedings of the 28 Annual 
Meeting of the Association for Computational Linguistics, June, 



1990* Statistical approaches look at word patterns and word 
co-occurrence and attempt to parse natural-language sentences 
based on the likelihood of such patterns • Statistical approaches 
use a variety of methods including neural networks and word 
5 distribution. As with any other statistical pattern matching 
approach, this approach is ultimately limited by an upper limit 
on error rate which cannot be easily exceeded. Also, it is very 
difficult to handle wide varieties of linguistic phenomena such 
as scrambling, NP-movement, binding between question words and 
10 empty categories, etc., through statistical natural-language 
processing. 

Approaches to natural-language processing based on Noam 
Chomsky's Government and Binding theories as described in Some 
Concepts and Consequences of the Theory of Government and Binding 

15 Cambridge, Mass. MIT Press, offer a possibility of a more robust 
approach to natural-language parsing by developing computational 
methods based on linguistic theory of a universal language. 
Head-driven Phrase Structure Grammar (HPSG) is a major off-shoot 
of GB theory and a number of such parsers are being developed. 

20 The GB-based approach can find syntactic structure in scrambled 
sentences such as 'I play football' and 'football, I play'. The 
GB-based approach also handles NP-movement that is exemplified 
by a passive sentence such as, 'Football was played. ' which has 
the deeper structure ' [ [] [was played [football]] In 

25 parsing this natural-language sentence the noun phrase (NP) 
'football' moves from its original position after the verb 'was 
played' to the front of the sentence, because otherwise the 
sentence would have no subject. Binding between question words 
and empty categories is exemplified by a question such as 'Whom 
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will he invite*' The GB approach finds that this sentence has 
the deep structure [he will [invite [whom]] ]. The question word 
•whom' binds the empty trace that it leaves when it moves to the 
front of the sentence. 

The principle-based-parsing technique described by Robert 
C. Berwick, in Principles of Principle-Based Parsing, 
Principle-Based Parsing: Computational and Psycholinguist ics, 
Kluwer Academic Publishers, pp. 1-37 (1991), and by Fong and 
Sandiway in The Computational Implementation of Principle-Based 
Parsers, Principle-Based Parsing: Computational and 
Psycholinguistics, Kluwer Academic Publishers, pp. 65-83 (1991), 
offers a possibility of a more robust approach. 
Principle-based-parsing uses a few principles for filtering 
sentences. A sequence of principle based filters eliminates 
illegal parses and the remaining parse is the legal one. A 
primary difficulty with this method is that it generates too many 
parses which makes the GB-based approach computationally slow. 
Methods for improving performance of GB-based parsing include: 

1. appropriately sequencing the principle based filters 
to reduce over-generation as described by Fong and 
Sandiway; or 

2. 'co-routining' by interleaving the actual parsing 
mechanism with the principle filters as described by 
Bonnie Jean Dorr in Principle-Based Parsing for 
Machine Translation, Principle-Based Parsing: Computa- 
tional and Psycholinguistics, Kluwer Academic Publish- 
ers, pp. 153-183 (1991). 
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Disclosure of Invention 

An object of the present invention is to provide a voice- 
based command system that can translate commands spoken in 
natural-language into commands accepted by a computer program. 
5 An object of the present invention is to provide a voice- 

based command system that can translate commands spoken in 
natural-language into commands accepted by different computer 
programs . 

Another object of the present invention is to provide a 

10 natural-language-syntactic-parser that resolves ambiguities in 
a voice command. 

Another object of the present invention is to provide a 
command interpreter that handles incomplete commands gracefully 
by interpreting the command as far as possible, and by retaining 

15 information from the command for subsequent clarification. 

Another object of the present invention is to provide a 
voice based command system that is efficient in any operating 
environment, and that is portable with minor modifications to 
other operating environments. 

20 Briefly, the present invention is a natural-language speech 

control method that produces a command for controlling the 
operation of a digital computer from words spoken in a natural- 
language. The method includes the step of processing an audio 
signal that represents the spoken words to generate textual 

25 digital-computer-data. The textual digital-computer-data 

contains representations of the words in the command spoken in 
a natural-language. The textual digital-computer-data is then 
processed by a natural-language-syntactic-parser to produce a 
parsed sentence. The parsed sentence consists of a string of 
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words with each word being associated with a part of speech in 
the parsed sentence • The string of words is then preferably 
processed by a semantic compiler to generate the command that 
controls the operation of the digital computer. 
5 The preferred embodiment of the present invention uses a 

GB-based natural-language-syntactic-parser which reveals implied 
syntactic structure in English language sentences. Hence the 
GB-based natural-language-syntactic-parser can resolve ambiguous 
syntactic structures better than alternative methods of natural- 

10 language processing. Using a generalized principles-and- 
parameters GB-based natural-language-syntactic-parser for the 
natural-language speech control method provides a customizable 
and portable parser that can be tailored to different operating 
environments with modification. With generalized principles-and- 

15 parameters, a GB-based approach can describe a large syntax and 
vocabulary relatively easily, and hence provides greater robust- 
ness than other approaches to natural-language processing. 

These and other features, objects, and advantages will be 
understood or apparent to those of ordinary skill in the art from 

20 the following detailed description of the preferred embodiment 
as illustrated in the various drawing figures. 

Brief Description of Drawings 

FIG, 1 is a flow diagram illustrating the overall approach 
25 to processing spoken, natural-language computer commands with a 
natural-language speech control system in accordance with the 
present invention; 
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FIG. 2 is a flow diagram, similar to that depicted in FIG* 
1, that illustrates the presently preferred embodiment of the 
natural-language speech control system; 

FIG. 3 depicts a logical form of a parsing of a sentence 
5 produced the presently preferred GB-based principles-and- 
parameters syntactic parser employed in the natural-language 
speech control system depicted in FIG. 2; 

FIG. 4 is a flow diagram illustrating how a sentence is 
parsed by the presently preferred GB-based principles-and-pararoe- 
10 ters syntactic parser employed in the natural-language speech 
control system depicted in FIG. 2; 

FIG. 5 is a block diagram depicting an alternative embodi- 
ment of a semantic compiler that converts parsed computer 
commands into machine code executable as a command to a digital 
15 computer program; and 

FIG. 6, is a block diagram depicting a preferred embodiment 
of a semantic compiler that converts parsed computer commands 
into machine code executable as a command to a digital computer 
program. 

20 

Best Mode for Carrying Out the Invention 

FIG. 1 depicts a natural-language speech control system in 
accordance with the present invention referred to by the general 
reference character 20. As illustrated in FIG. 1, the 
25 natural- language speech control system 20 first processes a 
spoken command received as an audio signal with a robust 
automatic speech recognition computer program 22. The speech 
recognition computer program 22 produces textual digital- 
computer-data in the form of an ASCII text stream 24 that 
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contains a text of the spoken words as recognized by the speech 
recognition computer program 22. The text stream 24 is then 
processed by a syntactic-parser 26 which converts the text stream 
24, representing the spoken words, into a parsed sentence having 
5 a logical form 28. The logical form 28 associates a part of 
speech in the parsed sentence with each word in a string of 
words. The logical form 28 is processed by a semantic compiler 
32 to generate a command in the form of a machine code 34 that 
is then processed by a computer program executed by a computer 

10 36 to control its operation. 

As is readily apparent to those skilled in the art, the 
speech recognition computer program 22, syntactic-parser 26 and 
semantic compiler 32 will generally be computer programs that are 
executed by the computer 36. Similarly, the text stream 24 and 

15 logical form 28 data in general will be stored, either temporari- 
ly or permanently, within the computer 36. 

FIG. 2 is a flow diagram that depicts a presently preferred 
implementation of the natural-language speech control system 20. 
As depicted in FIG. 2, the preferred implementation of the 

20 natural-language speech control system 2 0 includes an error 
message facility 42. The error message facility 42 permits the 
natural-language speech control system 20 to inform the speaker 
of difficulties that the natural-language speech control system 
20 encounters in attempting to process a spoken computer command. 

25 The error message facility 42 inform the speaker about the 
processing difficulty either audibly or visibly. In the specific 
implementation of the natural-language speech control system 20 
depicted in FIG. 2, the machine code 34 produced by the semantic 
compiler 32 is an MS DOS command. The computer 36 executes the 
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MS DOS command to produce a result 44 specified by the spoken 
command. 



Speech Recognition Computer Program 22 

5 The speech recognition computer program 22 processes the 

audio signal that represents spoken words to generate a string 
of words forming the text stream 24 . A number of companies have 
developed computer programs for transcribing voice into text. 
Several companies offering such computer programs are listed 
10 below. 

1, BBN, a wholly owned subsidiary of GTE, has a 
Unix-based speech recognizer called Hark 

2, Dragon Systems markets Dragon Dictate 

3, IBM markets VoiceType Dictation 
15 4. Kurzweil Applied Intelligence 

5. Microsoft Research's Speech Technology Group is 
developing a speech recognition engine named Whisper 

6 . PureSpeech 

7. SRI Corpus STAR Lab has a group developing a 

20 wideband, continuous speech recognizer called DECI- 

PHER 

8. The AT&T's Advanced Speech Products Group offers a 
speech recognizer named WATSON. 



25 Most of the systems identified above work with discrete 

speech in which a speaker must pause between words. Also these 
systems require some level of speaker training to attain high- 
accuracy speech recognition. Ideally, a continuous speech 
recognizer that employs a Hidden Markov Model is to be preferred. 
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Of the systems listed above, Dragon Systems speech recognizer 
seems to be the most robust, has been used by the United States 
Armed forces in Bosnia, and is presently preferred for the 
natural-language speech control system 20 ♦ The Dragon Systems 
5 speech recognizer runs on an IBM PC compatible computer operating 
under the Microsoft Windows graphical user interface. Initial 
tests have demonstrated a very high degree of accuracy with a 
large number of speakers with unconstrained language and a 
variety of accents. 

10 In general, for a single sentence or command the speech 

recognition computer program 22 can generate a plurality of 
word-vectors. Each word-vector corresponds to one spoken word 
in the sentence or computer command. Each word-vector includes 
at least one, but probably several, two-tuples consisting of a 

15 word recognized by the speech recognition computer program 22 
together with a number which represents a probability estimated 
by the speech recognition computer program 22 that the audio 
signal actually contains the corresponding spoken word. 
Exhaustive processing of a spoken command by the syntactic-parser 

20 26 requires that several strings of words be included in the text 
stream 24. Each string of words included in the text stream 24 
for such exhaustive processing is assembled by concatenating 
successive words selected from successive word-vectors. The 
several strings of words in the text stream 24 to be processed 

25 by the syntactic-parser 26 are not identical because in every 
string at least one word differs from that in all other strings 
of words included in the text stream 24. 
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Syntactic-Parser Computer Program 2 6 

The syntactic-parser 26 incorporated into the preferred 
embodiment of the natural-language speech control system 20 is 
based on a principles-and-parameters (P-and-P) syntactic parser, 
5 Principar. Principar has been developed by and is available from 
Prof- DeKang Lin at the University of Manitoba in Canada. 
P-and-P parsing is based on Noam Chomsky's GB-based theory of 
natural-language syntax. Principar 's significant advantage over 
other natural-language-syntactic-parsers is that with relatively 
10 few rules, it can perform deep parses of complex sentences. 

The power of the P-and-P framework can be illustrated by 
considering how it can easily parse both Japanese and English 
language sentences. In English, typically the word order in a 
sentence is subject-verb-object as in 'He loves reading'. But 
15 in Japanese, the order is typically subject-object-verb. Now if 
GB-based parser employs a principle which states that 'sentences 
contain subjects, objects and verbs', and the GB-based parser's 
parameter for 'word-order' of sentences is subject-verb-object 
for English and subject-object-verb for Japanese, the GB-based 
20 parser's principles and parameters described a grammar for simple 
sentences in both English and Japanese. This is the essence of 
the P-and-P framework. 

To describe the complex interactions of different sentence 
elements, the syntactic-parser 26 depicted in FIG. 2 uses the 
25 following principles. 

1. Case Theory: Case theory requires that every overt 
noun phrase (NP) be assigned an abstract case, such as 
nominative case for subjects, accusative case for 
direct objects, dative case for indirect objects, etc. 
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2- X-bar Theory: X-bar theory describes how the syntactic 
structure of a sentence is formed by successively 
smaller units called phrases. This theory determines 
the word-order in sentences, 
5 3- Movement Theory: The rule Move-a specifies that any 

sentence element can be moved from its base position 
in the underlying D-structure, to anywhere else in the 
surface structure. Whether a particular movement is 
allowed depends on other constraints of the grammar. 
10 For example, the result of a movement must satisfy the 

X-bar schema. 

4. Bounding Theory: This theory prevents the results of 
movement from extending too far in the sentence. 

5. Binding Theory: This theory describes the structural 
15 relationship between an empty element left behind by 

a moved NAP and the moved NP itself. 

6. 6-Theory: This theory deals with the assignment of 
semantic roles to the NPs in a sentence. 

The preceding principles, and some other more complex ones, that 
20 are described by Robert C. Berwick, in Principles of 
Principle-Based Parsing, Principle-Based Parsing: Computational 
and Psycholinguistics, Kluwer Academic Publishers, pp. 1-37 
(1991) , are used for parsing English with Principar. 

With a GB-based approach to natural-language parsing, 
25 commands to computers can be understood as verb phrases that are 
a sub-set of complete English sentences. The sentences have an 
implied second person singular pronoun subject and the verb is 
active voice present tense. For instance, to resume work on a 
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previous project, one might issue to a computer the following 
natural-language command. 



'Edit the first document on nip-based command interpreters.' 

5 

Possible word vectors that the speech recognition computer 
program 22 might produce for the preceding sentence are set forth 
below* 



1 n 




edit 


u . y u 








U . v>U 








U . / U 






da 


0.60 


15 




their 


0.40 






there 


U . 4 U 






them 


0.20 




f i rs t 


f 1 r <! t 

X. X JL ^ U 


n an 


20 




force 


0.40 






fast 


0.30 






force 


0.30 






hearse 


0.15 






curse 


0.15 


25 




purse 


0.05 




document 


document 


0.80 






dock-meant 


0.40 


30 


on 


on 


0.75 






hun 


0.40 






an 


0.35 




nip 


nip 


0.10 


35 








based 


based 


0.75 






baste 


0.50 






paste 


0,35 


40 


command 


command 


0.90 






come-and 


0.55 




interpreters 


interpreters 


0.85 






inter-porter 


0.40 



45 



A parsing of the preceding sentence by Principar for the 
actual words appears in FIG. 3. The GB-based parse presented 
in FIG. 3 allows the computer to map a verb (V) into a computer 
50 command action, with the noun phrase (NP) as the object, and the 
adjective phrase (AP) as properties of the object. 
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Limiting GB-based syntactic parsing to only active voice, 
second person verb-phrase parsing, permits implementing an 
efficient semantic compiler 32 that allows operating a computer 
with computer transcribed voice commands. Since only a sub-set 
5 of English is used computer commands, parameters can be set to 
limit the number of parses generated by the syntactic-parser 26* 
For example, the case principle may be set to only accusative 
case for verb-complements, oblique case for prepositional 
complements and genitive case for possessive nouns or pronouns • 

10 A nominative case principle is unnecessary since the computer 
commands lack an express subject for the main clause. Such 
tuning of the principles to be applied by the syntactic-parser 
26 significantly reduces the number of unnecessary parses 
produced by the GB-based P-and-P syntactic-parser Principar. 

15 By using a GB-based P-and-P syntactic-parser, moving the 

natural-language speech control system 20 between computer 
applications or between computer platforms involves simply 
changing the lexicon, and the parameters. Due to the modular 
framework of the grammar implemented by the syntactic-parser 26, 

20 with minor changes in parameter settings more complicated 
sentences such as the following queries and implicit commands may 
be parsed. 

'Which files have been modified after July 4th?' 
25 'How many words are there in this document? ' 

'J would like to delete all files in this directory.^ 

As illustrated in FIG. 4, the syntactic-parser 26 includes 
a set of individual principle-based parsers 52 Pj through P„, a 

30 dynamic principle-ordering system 54, principle parameters 
specifiers 56, and a lexicon specifying system 58. The heart of 
the syntactic-parser 26 is the set of individual principle-based 
parsers 52. Each of the principle-based parsers 52 implements 
an individual principle such as those listed and described above. 

35 Each principle is abstract and is described in a manner different 
from each other (i.e. heterogeneous). For instance, the X-bar 
theory for English states that the verb must precede the object, 
while the 0-theory states that every verb must discharge itself. 
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The various principle-based parsers 52, each implemented as a 
separate computer program module, formalize the preceding princi- 
ples. Each principle-based parser 52 applies its principle to 
the input text and the legal parses which it receives from the 
5 preceding principle-based parser 52. The principle-based parser 
52 then generates a set of legal parses according to the 
principle which it formalizes. Because the principle-based 
parsers 52 process an input sentence sequentially, the 
syntactic-parser 26 employs a set of data structures common to 

10 all the principle-based parsers 52 that allows the input text and 
the legal parses to be passed from one principle-based parser 52 
to the next. Moreover, the syntactic-parser 2 6 includes a 
principle-ordering system 54 that controls a sequence in which 
individual principles, such as those summarized above, are 

15 applied in parsing a text. 

To parse more than one language, each of the principle-based 
parsers 52 receives parameter values from the principle parame- 
ters specifiers 56. For instance, with the X-bar theory, a verb 
precedes the object in English, while in Japanese the object 

20 precedes the verb. Consequently, the grammar for each principle 
formalized in the principle-based parsers 52 needs to be dynami- 
cally generated, based on parameter values provided by the 
principle parameters specifiers 56. 

Principar's lexicon specifying system 58 contains over 

25 90,000 entries, extracted out of standard dictionaries. The 
structure of the lexicon specifying system 58 is a word-entry 
followed by functions representing parts-of -speech categories and 
other features. To properly parse computer commands, Principar's 
lexicon must be extended by adding recently adopted, platform- 

30 specific computer acronyms. 

Semantic Compiler Computer Program 32 

Parsing the text stream 24 into the logical form depicted 
in FIG. 3 permits the semantic compiler 32 to use a conventional 
35 LR grammar in generating the machine code 34 from the logical 
form 28. Parsing the text stream 24 into the canonical form is 
possible because commands are restricted to imperative sentences 
that are second person, active voice sentences that begin with 
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a verb. Limiting the natural-language commands in this way 
insignificantly restricts the ability to issue voice commands. 
The canonical logical form of a command can be parsed into the 
machine code 34 by the semantic compiler 32 using a conventional 
5 lexical analyzer named LEX and a conventional compiler writer 
named YACC. . 

As indicated in FIG. 2, the preferred semantic compiler 32 
has an ability to detect some semantic errors, and then send a 
message back to a speaker via the error message facility 42 about 
10 the specific nature of the error. An example of a semantic error 
would be if an action was requested that was not possible with 
the object. For instance an attempt to copy a directory to a 
file would result in a object type mix-match, and therefore cause 
an error. 

15 An alternative approach for generating the machine code 34 

to the conventional LR grammar described above would be for the 
semantic compiler 32 to take parse trees expressed in the 
canonical form in the logical form 28 as input and then map them 
into appropriate computer commands. This would be done by a 

20 command-interpreter computer program 62 by reference to mapping 
tables 64 which maps verbs to different actions. 

Different computer programs perform the same abstract 
natural-language commands for similar operations. However, each 
computer program requires different types of commands that need 

25 to be handled uniquely. The conventional LR grammar, or the 
combined command-interpreter computer program 62 and mapping 
tables 64, permit the semantic compiler 32 to prepare operating 
system commands 72, word processing commands 74, spreadsheet 
commands 76, and/or database commands 78 from the parse trees in 

30 the logical form 28. Note that the command-interpreter computer 
program 62 needs to have different functionality depending on the 
application domain to which the command is addressed. If a 
computer command is directed to DOS or a Unix command shell, the 
operating system can directly execute the machine code 34 . But 

35 the word processing commands 74, the spreadsheet commands 76, or 
the database commands 78 must be piped through the operating 
system to that specific application. To facilitate this kind of 
command, the natural-language speech control system 20 must run 
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in the background piping the machine code 3 4 to the current 
application. 

Industrial Applicability 
5 In adapting the natural-language speech control system 20 

for preparing commands for execution by a variety of computer 
programs, the speech recognition computer program 22 and the 
syntactic-parser 26 are the same regardless of the computer 
program that will execute the command. However, as depicted in 

10 FIG. 6, the semantic compiler 32 includes a set of semantic 
modules 84 used for generating commands that control different 
computer programs. Of these semantic modules 84, there are a set 
of semantic modules 84 to prepare commands for controlling 
operating system functions. Other optional semantic modules 84 

15 generate commands for controlling operation of different applica- 
tion computer programs such as the word processing commands 74, 
spreadsheet commands 76 and database commands 78 illustrated in 
FIG. 5. In addition, the semantic compiler 32 includes a set of 
semantic modules 84 for configuration, and for loading each 

20 specific application computer program. 

Although the present invention has been described in terms 
of the presently preferred embodiment, it is to be understood 
that such disclosure is purely illustrative and is not to be 
interpreted as limiting. While preferably the text stream 24 

25 represents a spoken command with an ASCII text stream, as is 
readily apparent to those skilled in the art any digital computer 
representation of textual digital-computer-data may be used for 
expressing such data in the text stream 24. Similarly, while 
preferably the semantic compiler 32 employs a canonical logical 

3 0 form to represent computer commands parsed by the semantic 
compiler 32, any other representation of the parsed computer 
commands that provides the same informational content may be used 
in the semantic compiler 32 for expressing parsed commands* 
Consequently, without departing from the spirit and scope of the 

35 invention, various alterations, modifications, and/or alternative 
applications of the invention will, no doubt, be suggested to 
those skilled in the art after having read the preceding 
disclosure. Accordingly, it is intended that the following claims 
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be interpreted as encompassing all alterations, modifications, 
or alternative applications as fall within the true spirit and 
scope of the invention. 



The Claims 

What is claimed is: 



- 20 - 



I. A universal voice-command-interpretation method for 
5 producing from spoken words a command that is adapted for 
controlling operation of a digital computer, the method compris- 
ing the steps of : 

receiving an audio signal that represents the spoken words; 
processing the received audio signal to generate therefrom 
10 textual digital-computer-data that contains representations of 
individual spoken words; 

processing the textual digital-computer-data with a 
natural-language-syntactic-parser to produce a parsed sentence 
that consists of a string of words with each word being associat- 
15 ed with a part of speech in the parsed sentence; and 
generating the command from the parsed sentence. 

2 • The method of claim 1 wherein the parsed sentence has 
a syntax of an implied second person singular pronoun subject and 
20 an active voice present tense verb. 

3, The method of claim 1 wherein processing of the audio 
signal to generate therefrom the textual digital-computer-data 
produces a plurality of word-vectors* 

25 

4 . The method of claim 3 wherein each word-vector includes 
at least one two-tuple consisting of a word together with a 
number which represents a probability that the audio signal 
actually contains that spoken word. 

30 

5. The method of claim 3 wherein the textual 
d i g i t a 1 - c o m p u t e r - d a t a processed by the 
natural-language-syntactic-parser consists of a string of words, 
each successive word being selected from successive word-vectors 

35 in the plurality of word-vectors. 

6. The method of claim 5 wherein in producing the command 
the natural-language-syntactic-parser processes at least two 
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unidentical strings of words in which at least one word is 
different* 

7. The method of claim 1 wherein the 
5 natural-language-syntactic-parser is a government-and-binding- 

based (GB-based) natural-language-syntactic-parser . 

8. The method of claim 7 wherein the GB-based 
natural-language-syntactic-parser is a principles-and-parameters 

10 (P-and-P) syntactic parser, 

9. The method of claim 1 wherein the command is generated 
from the parsed sentence by a semantic compiler. 

15 10, The method of claim 9 wherein the semantic compiler 

uses a LR grammar in generating the command. 

11. The method of claim 10 wherein the semantic compiler 
upon detecting a semantic error dispatches a message that 

20 describes the semantic error. 

12. The method of claim 11 wherein the message describing 
the semantic error that is dispatched by the semantic compiler 
is presented audibly to a speaker. 

25 

13. The method of claim 11 wherein the message describing 
the semantic error that is dispatched by the semantic compiler 
is presented visibly to a speaker. 

30 14. The method of claim 10 wherein the semantic compiler 

includes a plurality of semantic modules that respectively 
generate commands for controlling operation of different computer 
programs . 

35 15. The method of claim 14 wherein the semantic compiler 

includes a at least one semantic modules that generates operating 
system commands. 



16. The method of claiin 14 wherein the semantic compiler 
includes a at least one semantic modules that generates applica- 
tion program commands. 

17. The method of claim 14 wherein the semantic compiler 
includes a at least one semantic modules that generates configu- 
ration commands. 

18. The method of claim 14 wherein the semantic compiler 
includes a at least one semantic modules that generates program 
loading commands. 

19. The method of claim 1 further comprising the step of 
transmitting the command to the digital computer. 
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